Similarity searching of sub-structured databases

ABSTRACT

A grouped data base 10 includes a plurality of sub-databases 10A, in which data have been stored, for respective ones of data sets having specific classification data (&#34;male&#34;, &#34;female&#34;, &#34;annual income of 20,000,000 yen or more&#34;). The grouped data base 10 is searched in accordance with a given search condition. Given search conditions are stored successively in a search-result data base 11. The sub-data bases 10A contained in the grouped data base 10 are reorganized so as to be in line with those given search conditions that have a high frequency of occurrence.

TECHNICAL FIELD

This invention relates to a database management method and apparatus aswell as a database search method and apparatus.

BACKGROUND ART

When a database having a table format is searched, generally use is madeof a sequential search as the search method. A database having a tableformat includes a plurality of data sets which containidentification-code data for identifying the data sets andclassification data with regard to respective ones of a plurality ofitems. The search of a database is conducted by giving theclassification to be searched as a keyword. However, since the structureof a database having a table format is fixed, it is difficult to conducta search at high speed.

There is an apparatus in which data having common items are collectedtogether by a filing device, which correlates and files a plurality ofdata extending over a plurality of items, thereby creating group data,the group data are filed in a magnetic disk device, group datacorresponding to a search command are selected when the search commandhas been applied, and data corresponding to the search command areextracted from the group data selected. (For example, see JapanesePatent Application Laid-Open No. 2-19968.)

However, the common items in the group data are fixed on a group-databasis even in such a file apparatus. As a consequence, when a new searchcommand different from that of the common item is applied, a rapidsearch is difficult.

DISCLOSURE OF THE INVENTION

An object of the present invention is to shorten the time needed forsearching and conduct a comparatively efficient database search.

A database management method according to a first aspect of the presentinvention is a method of organizing one or a plurality of sub-databasesfrom an original database comprising a collection of data setscontaining identification numbers and including classification data onan item-by-item basis, characterized by giving a search conditiondesignating one or a plurality of classifications to be searched, andorganizing the one or a plurality of sub-databases comprising acollection of data sets having the designated classification of thegiven search condition in common.

An apparatus for executing the method according to the first aspect ofthe invention also is provided. Specifically, the apparatus is anapparatus for managing an original database comprising a collection ofdata sets containing identification numbers and including classificationdata on an item-by-item basis, characterized by comprising an input unitfor accepting a search condition designating one or a plurality ofclassifications to be searched, and sub-database organizing means fororganizing one or a plurality of sub-databases comprising a collectionof data sets having in common the designated classification of the givensearch condition accepted by the input unit.

In accordance with the first aspect of the invention, when a searchcondition is given, one or a plurality of sub-databases comprising acollection of data sets having the designated classification of thegiven search condition in common are organized. Accordingly, the nexttime the same search condition or a closely resembling search conditionis given, search time is shortened and an efficient search can beconducted.

A database management method according to a second aspect of theinvention is a method of reorganizing sub-databases when there is atleast one of an original database comprising a collection of data setscontaining identification codes and including classification data on anitem-by-item basis, and one or a plurality of sub-databases, createdbased upon the original database, comprising a collection of data setshaving a specific classification in common, characterized by storingdesignated classifications contained in a search condition whenever thesearch condition, which designates one or a plurality of classificationsto be searched, is given, calculating degrees of similarity betweendesignated classifications having a high frequency of occurrence amongthe stored designated classifications and a specific classificationcommon to the sub-databases, and, in a case where there is a designatedclassification among the designated classifications having a highfrequency of occurrence that exhibits a low degree of similarity withregard to any specific classification, creating a sub-databasecomprising a collection of data sets having this designatedclassification in common.

An apparatus for executing the method according to the second aspect ofthe invention also is provided. Specifically, the apparatus is anapparatus for managing a database which includes at least one of anoriginal database comprising a collection of data sets containingidentification codes and including classification data on anitem-by-item basis, and one or a plurality of sub-databases, createdbased upon the original database, comprising a collection of data setshaving a specific classification in common, characterized by comprisingan input unit for accepting a search condition designating one or aplurality of classifications to be searched, memory means for storingdesignated classifications contained in a search condition whenever thesearch condition is accepted by the input unit, similarity calculatingmeans for calculating degrees of similarity between designatedclassifications having a high frequency of occurrence among thedesignated classifications stored in the memory means and a specificclassification common to the sub-databases, determination means fordetermining whether the designated classifications having a highfrequency of occurrence include a designated classification thatexhibits a low degree of similarity with regard to any specificclassification in the degrees of similarity calculated by the similaritycalculating means, and sub-database creating means which, when it hasbeen determined by the determination means that the designatedclassifications having a high frequency of occurrence include adesignated classification exhibiting a low degree of similarity withregard to any specific classification, is for creating a sub-databasecomprising a collection of data sets having the designatedclassification in common.

In accordance with the second aspect of the invention, if the storeddesignated classifications having a high frequency of occurrence includeone which exhibits a low degree of similarity with regard to anyspecific classification, a sub-database comprising a collection of datasets having this designated classification in common is created.Accordingly, a specific classification common to the sub-databases willagree with that having a high frequency of designation in given searchconditions.

The sub-database will be suited to the given search condition, searchefficiency is improved and search time is shortened.

A database management method according to a third aspect of theinvention is a method of reorganizing sub-databases when there are oneor a plurality of sub-databases, created based upon an original databasecomprising a collection of data sets containing identification codes andincluding classification data on an item-by-item basis, comprising acollection of data sets having a specific classification in common,characterized by storing designated classifications contained in asearch condition whenever the search condition, which designates one ora plurality of classifications to be searched, is given, calculatingdegrees of similarity between designated classifications having a highfrequency of occurrence among the stored designated classifications anda specific classification common to the sub-databases, and, in a casewhere there is a designated classification among the designatedclassifications having a high frequency of occurrence that exhibits alow degree of similarity with regard to any specific classification,creating a first sub-database comprising a collection of data setshaving this designated classification in common, and a secondsub-database comprising a collection of all data sets not contained inthe first sub-database but contained in the original database.

An apparatus for executing the method according to the third aspect ofthe invention also is provided. Specifically, the apparatus is anapparatus for managing a database which includes one or a plurality ofsub-databases, created based upon an original database comprising acollection of data sets containing identification codes and includingclassification data on an item-by-item basis, comprising a collection ofdata sets having a specific classification in common, characterized bycomprising an input unit for accepting a search condition designatingone or a plurality of classifications to be searched, memory means forstoring designated classifications contained in a search conditionwhenever the search condition is accepted by the input unit, similaritycalculating means for calculating degrees of similarity betweendesignated classifications having a high frequency of occurrence amongthe designated classifications stored in the memory means and a specificclassification common to the sub-databases, determination means fordetermining whether the designated classifications having a highfrequency of occurrence include a designated classification thatexhibits a low degree of similarity with regard to any specificclassification in the degrees of similarity calculated by the similaritycalculating means, and sub-database creating means which, when it hasbeen determined by the determination means that the designatedclassifications having a high frequency of occurrence include adesignated classification exhibiting a low degree of similarity withregard to any specific classification, is for creating a firstsub-database comprising a collection of data sets having this designatedclassification in common, and a second sub-database comprising acollection of all data sets not contained in the first sub-database butcontained in the original database.

In accordance with the third aspect of the invention, all data setscontained in the original database are stored in the first sub-databaseand in the second sub-database so that loss of data sets contained inthe original database is prevented.

A database search method according to a fourth aspect of the inventionis characterized by searching a database, in which there is stored adata set containing an identification code and including classificationdata on an item-by-item basis, by giving a search condition whichdesignates the classification of the data set and using a predeterminedsearch method decided by the given search condition and the structure ofthe database, successively storing, whenever a search is conducted, thesearch condition, search method and time required for a search when asearch is conducted, calculating degrees of similarity between a givensearch condition and stored search conditions when a search conditionhas been given, reading out a search method used when search time,required when a search was conducted under a search condition having ahigh calculated degree of similarity, is short, and searching thedatabase by the search method read out and outputting theabove-mentioned identification code of the data set havingclassification data conforming to the search condition.

An apparatus for executing the method according to the fourth aspect ofthe invention also is provided. Specifically, the apparatus comprisesdatabase searching means for searching a database, in which there isstored a data set containing an identification code and includingclassification data on an item-by-item basis, by giving a searchcondition which designates the classification of the data set and usinga predetermined search method decided by the given search condition andthe structure of the database, memory means for successively storing,whenever a search is conducted, the search condition, search method andtime required for a search when a search is conducted by the data basesearch means, similarity calculating means for calculating degrees ofsimilarity between a given search condition and search conditions, whichhave been stored in the memory means, when a search condition has beengiven, search-method readout means for reading out a search method usedwhen search time, required when a search was conducted under a searchcondition having a high degree of similarity calculated by thesimilarity calculating means, is short, and identification-code outputmeans for searching the database by the search method read out by thesearch-method readout means and outputting the above-mentionedidentification code of the data set having classification dataconforming to the search condition.

In accordance with the fourth aspect of the invention, the degree ofsimilarity between a given search condition and the stored searchcondition is calculated, and a search is conducted employing a searchmethod used when search time, which was required when a search wasconducted under a search condition having a high calculated degree ofsimilarity, is short.

Accordingly, a search is conducted by a comparatively suitable searchmethod and the time required for the search is curtailed.

A database search method according to a fifth aspect of the invention isa search method for a case where there are one or a plurality ofsub-databases, created based upon an original database comprising acollection of data sets containing identification codes and includingclassification data on an item-by-item basis, comprising a collection ofdata sets having a specific classification in common, wherein thesub-databases are reorganized, characterized by storing designatedclassifications contained in a search condition, search methods andtimes required for the searches whenever the search condition, whichdesignates one or a plurality of classifications to be searched, isgiven, calculating degrees of similarity between designatedclassifications having a high frequency of occurrence among the storeddesignated classifications and a specific classification common to thesub-databases, in a case where there is a designated classificationamong the designated classifications having a high frequency ofoccurrence that exhibits a low degree of similarity with regard to anyspecific classification, creating a sub-database comprising a collectionof data sets having this designated classification in common, storing aspecific classification common to sub-databases whenever a sub-databaseis created, calculating designated-classification degrees of similaritybetween a designated classification contained in a given searchcondition and designated classifications that have been stored,calculating specific-classification degrees of similarity between aspecific classification of a sub-database and specific classificationsthat have been stored, conducting a search of the above-mentionedsub-database by a search method used when the designated-classificationdegree of similarity and the specific-classification degree ofsimilarity are high and the time required when a search was conducted atsuch time is short, and outputting the above-mentioned identificationnumber of the data set having classification data conforming to thesearch condition.

An apparatus for executing the method according to the fifth aspect ofthe invention also is provided. Specifically, the apparatus is anapparatus for searching one or a plurality of sub-databases, createdbased upon an original database comprising a collection of data setscontaining identification codes and including classification data on anitem-by-item basis, comprising a collection of data sets having aspecific classification in common, comprising an input unit foraccepting a search condition designating one or a plurality ofclassifications to be searched, first memory means for storingdesignated classifications contained in a search condition, searchmethods and times required for the searches whenever the searchcondition is accepted by the input unit, first similarity calculatingmeans for calculating degrees of similarity between designatedclassifications having a high frequency of occurrence among thedesignated classifications stored in the first memory means and aspecific classification common to the sub-databases, determination meansfor determining whether the designated classifications having a highfrequency of occurrence include a designated classification thatexhibits a low degree of similarity with regard to any specificclassification in the degrees of similarity calculated by the firstsimilarity calculating means, sub-database creating means which, when ithas been determined by the determination means that the designatedclassifications having a high frequency of occurrence include adesignated classification exhibiting a low degree of similarity withregard to any specific classification, is for creating a sub-databasecomprising a collection of data sets having this designatedclassification in common, second memory means for storing a specificclassification common to sub-databases whenever a sub-database iscreated by the sub-database creating means, second similaritycalculating means for calculating degrees of similarity between adesignated classification contained in a search condition entered by theinput unit and designated classifications that have been stored in thefirst memory means, third similarity calculating means for calculatingdegrees of similarity between a specific classification of asub-database and specific classifications that have been stored in thesecond memory means, and identification-number output means forconducting a search of the above-mentioned sub-database by a searchmethod used when a designated-classification degree of similaritycalculated by the second similarity calculating means and aspecific-classification degree of similarity calculated by the thirdsimilarity calculating means are high and the time required when asearch was conducted at such time is short, and outputting theabove-mentioned identification number of the data set havingclassification data conforming to the search condition.

In accordance with the fifth aspect of the present invention, a searchof the sub-database is conducted by a search method used when thedesignated-classification degree of similarity and thespecific-classification degree of similarity are high and the timerequired for search performed at such time is short.

Accordingly, a search is conducted by a comparatively suitable searchmethod and the time required for the search is curtailed in the fifthaspect of the invention as well.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a network system for searching a database;

FIG. 2 illustrates the details of a database system;

FIG. 3 illustrates the details of processing for creating searchinstructions;

FIG. 4 illustrates the details of processing for analyzing thecharacteristics of a database;

FIG. 5 illustrates the content of original data that have been stored ina grouped database;

FIG. 6 illustrates the content of data that have been stored in adatabase in which results of a search are stored;

FIG. 7 illustrates the content of data that have been stored in adatabase-structure data base;

FIGS. 8a, 8b and 8c illustrate the content of data that have been storeda sub-database contained in a grouped database;

FIGS. 9a, 9b and 9c illustrate the content of data that have been storeda sub-database contained in a grouped database;

FIG. 10 illustrates the hardware configuration of a database system;

FIG. 11 illustrates a processing procedure when a search condition hasbeen given by a user;

FIG. 12 illustrates a processing procedure for deciding a method ofsearching a database contained in a grouped database;

FIG. 13 illustrates part of a processing procedure for calculating thedegree of similarity of database structures; and

FIG. 14 illustrates part of a processing procedure for calculating thedegree of similarity of database structures.

BEST MODE FOR CARRYING OUT THE INVENTION

FIG. 1 illustrates the network system of a database.

A plurality of terminal devices 1A, 1B, . . . are capable of beingconnected to a database system 3 via a network 2.

When a user applies a database search condition to the terminal device1A etc., the search condition is applied to the database system 3 viathe network 2. The database system 3 contains a database in which datacorresponding to the search condition from the user are retrieved. Theretrieved data are applied to the terminal device 1A etc. via thenetwork 2, whence the data are provided to the user.

FIG. 2 illustrates the database system 3 of such a database networksystem.

The system shown in FIG. 2 includes a grouped database 10, asearch-result database 11 and a database-structure database 12.

The grouped database 10 is a database of a group of databases composedof one or a plurality of sub-databases 10A in which respective ones ofdata sets each containing an identification code and includingclassification data on an item-by-item basis are stored per data sethaving a specific classification in common. The database 10 issuccessively reorganized in a manner described later. At the verybeginning the grouped database 10 is a single original database. Forexample, the grouped database 10 originally stores uncoordinatedoriginal data in which classifications such as male, female, companyemployee and government employee are included under each of such itemsas name, sex, occupation and annual income, as shown in FIG. 5. One setof data of names, sex, occupations and annual incomes in FIG. 5 isreferred to as a "data set".

The grouped database 10 is constructed from the plurality ofsub-databases 10A in response to the execution of a search, and theplurality of sub-databases are reorganized in accordance with thesearch.

For example, since the grouped database 10 is a single original databaseat the start, this single original database stores the uncoordinateddata of the kind shown in FIG. 5. When a user subsequently applies asearch condition "male", a search condition "female" and a searchcondition which includes a designated classification "annual income of20,000,000 yen or more", etc., sub-databases 10A are created so as tostore respective ones of data sets each of which has a common specificclassification conforming to the respective search condition, as shownin FIGS. 8(A), (B), (C). Furthermore, if application of a searchcondition "annual income of 10,000,000 yen or more" occurs morefrequently than application of the search condition "annual income of20,000,000 yen or more" in the database search, then, among thedatabases 10A created in the manner set forth above, the databasecomposed of the data set having the classification data "annual incomeof 20,000,000 yen or more" will be reorganized and a data set 10Acomposed of a data set having the classification data "annual income of10,000,000 yen or more" will be created.

The prescribed search method used when each of the plurality ofdatabases 10A contained in the grouped database 10 were searched, thesearch conditions, the search time required for the search and theresults of the search are stored in the search-result database 11successively per each search of the grouped database 10. An example ofsearch results stored in the search-result database 11 is illustrated inFIG. 6.

In FIG. 6, Search No. indicates the number of times a search isperformed. Grouped DB Structure No. specifies a database structurenumber that has been stored in the database-structure database 12, aswill be described later. User Search Condition indicates the searchcondition which prevailed at the time of a data search. Further, data(Searched DB) representing the database used in a search of the groupeddatabase 10, the pertinent number of cases, the time required for asearch and the search method, etc., are stored in the search-resultdatabase 11 in correspondence with each database used in a search. Inaddition, the total number of pertinent cases of data in the entiregrouped database 10 and the total search time required for the searchesare stored in the search-result database 11.

The database-structure database 12 stores specific classificationscommon to data sets stored in the sub-databases 10A contained in thegrouped database 10, the specific classifications representing thedatabase structure. An example of data stored in the database-structuredatabase 12 is illustrated in FIG. 7.

In FIG. 7, Grouped DB Structure No. specifies the data of the databasestructure. This corresponds to the number of times the grouped database10 is reorganized. The data that have been stored in thedatabase-structure database 12 are updated in response to reorganizationof the grouped database 10.

The data that have been stored in the database-structure database 12 arestored in correspondence with individual databases of the plurality ofsub-databases 10A contained in the grouped database 10. Examples of datathat have been stored in individual databases of the plurality ofsub-databases 10A contained in the grouped database 10 are illustratedin FIGS. 8a-8c. FIG. 8a illustrates some of the data that have beenstored in a database storing a data set having the specificclassification "male" for sex. FIG. 8b illustrates some of the data thathave been stored in a database storing a data set having the specificclassification "female" for sex. FIG. 8c illustrates some of the datathat have been stored in a database storing a data set having thespecific classification "20,000,000 yen or more" for annual income. Inaddition to these databases, the grouped database 10 includes auxiliarysub-databases that store data from which data, which have been stored inthe databases 10A of the grouped database 10 in FIGS. 8a-8c, have beenexcluded from the original data illustrated in FIG. 5. As a result, theoriginal data are maintained and never vanish. It should be noted,however, that since all of the original data are included owing to thedata sets having the specific classifications "male" and "female" shownin FIGS. 8a and 8b, an auxiliary sub-database (second sub-database) isunnecessary.

With reference again to FIG. 2, a search condition which designates aclassification to be searched is given by the user and the datarepresenting this search condition is applied to search-conditionanalysis processing 21 (implemented by a program) and search-resultanalysis processing 24 (implemented by a program).

In search-condition analysis processing 21, the content of the searchcondition given by the user is ascertained and data representing thecontent of the search condition are applied to search-instructioncreation processing 30 (implemented by a program).

In response to application of the data representing the content of thesearch condition to search-instruction creation processing 30, aninstruction for searching the grouped database 10 is created in thesearch-instruction creation processing 30. More specifically,instructions for deciding search methods (e.g., sequential, direct,etc.) are created in correspondence with each of the plurality ofsub-date bases 10A contained in the grouped database 10 are created. Thedetails of this search-instruction creation processing will be describedlater.

A search instruction created in the search-instruction creationprocessing 30 is applied to search execution processing 22 (implementedby a program) and search-result storage/readout control processing 23(implemented by a program).

By applying the search instruction to the search execution processing22, the grouped database 10 is searched in accordance with the appliedsearch condition and data conforming to the search condition are readout of the grouped database 10 and outputted.

The data from the grouped database 10 are not only outputted andprovided to the user but also applied to the search-result analysisprocessing 24 (implemented by a program).

On the basis of the data outputted by the grouped database 10 owing tothe search and the search instruction provided by the search-instructioncreation processing 30, the search-result analysis processing 24 goes tothe plurality of sub-databases 10A contained in the grouped database 10to analyze the database utilized in the search, the search condition,the pertinent number of cases of the data, the time required for thesearch and the search method, etc. These analyzed data are applied tothe search-result storage/readout control processing 23 (implemented bya program). The data analyzed in the search-result analysis processing24 are applied to the search-result database 11 by the search-resultstorage/readout control processing 23 and are stored in the manner shownin FIG. 6.

The grouped database 10 is reorganized when reorganization of thesub-databases 10A currently included in the grouped database 10 isjudged to be necessary. The data stored in the search-result database 11as a result of this reorganization are read out by the search-resultstorage/readout control processing 23 and applied todatabase-characteristic analysis processing 25 (implemented by aprogram).

On the basis of the data that have been stored in the search-resultdatabase 11, the database-characteristic analysis processing 25 analyzesthe characteristic of the structure of the sub-databases 10A currentlyin the grouped database 10, judges whether reorganization of the groupeddatabase 10 is necessary and, when reorganization is necessary, appliesdata indicative of this fact to reorganization-instruction creationprocessing 40 (implemented by a program). The details of analyticalprocessing in the database-characteristic analysis processing 25 will bedescribed later.

Data representing an instruction for reorganizing the grouped database10 are created in the reorganization-instruction creation processing 40and applied to database structure storage/readout control processing 26(implemented by a program) and reorganization execution processing 27(implemented by a program).

The databases contained in the grouped database 10 are reorganized bythe reorganization execution processing 27. Further, when databasescontained in the grouped database 10 are reorganized, data representingthe structures of these reorganized databases are stored in thedatabase-structure database 12 in correspondence with the respectivedatabases by the database structure storage/readout control processing26. Thus, the data stored in the database-structure database 12 areupdated along with the reorganization of the sub-databases 10A containedin the grouped database 10.

The reorganization of the sub-databases 10A contained in the groupeddatabase 10 can be carried out also by a reorganization instructionprovided by the user. To this end, the database search system includesreorganization-instruction analysis processing 28 (implemented by aprogram).

A reorganization instruction given by the user is applied to thereorganization-instruction analysis processing 28, and the content ofthe reorganization instruction (namely into which collection of datasets, with particular specific classifications, the data base is to bepartitioned) is analyzed by the reorganization-instruction analysisprocessing 28. The reorganization instruction provided by the user isanalyzed by the reorganization-instruction analysis processing 28 andthe analytical data are applied to the reorganization-instructioncreation processing 40. The reorganization instruction data conformingto the reorganization instruction provided by the user are analyzed inthe reorganization-instruction creation processing 40 and thedatabase-structure database 12 and sub-databases 10A contained in thegrouped database 10 are reorganized by the database structurestorage/readout control processing 26 and reorganization executionprocessing 27.

FIG. 3 illustrates the details of the search-instruction creationprocessing 30.

In a case where a search based upon a search condition provided by thesearch-condition analysis processing 21 is conducted, thesearch-instruction creation processing 30 decides search methods, incorrespondence with the individual databases of the plurality ofsub-databases 10A contained in the grouped database 10, in such a mannerthat time required for the search is shortened. The search of thegrouped database 10 is conducted by the search methods, which correspondto individual databases of the plurality of sub-databases 10A containedin the grouped database 10, decided by the search-instruction creationprocessing 30.

The data provided by the search-condition analysis processing 21 andrepresenting the content of the search condition are applied tosearch-result database search-condition processing 31, search-conditionsimilarity calculation processing 32 and database-structure databasesearch-condition decision processing 34.

When the data representing the content of the search condition areapplied to the search-result database search-condition processing 31,readout control data for reading out data, which have been stored in thesearch-result database 11, are created by the search-result databasesearch-condition processing 31. The readout control data are applied tothe search-result storage/readout control processing 23 and all of thedata that have been stored in the search-result database 11 are read outin succession.

The data read out of the search-result database 11 are applied to thesearch-condition similarity calculation processing 32 via thesearch-result storage/readout control processing 23. The degree ofsimilarity between the search condition provided by the search-conditionanalysis processing 21 and the search condition of each single search iscalculated in the search-condition similarity calculation processing 32with regard to all data that have been stored in the search-resultdatabase 11. This search-condition similarity calculation processing canbe executed in accordance with database-structure similarity calculationprocessing, described later.

When the data representing the content of the search condition areprovided by the search-condition analysis processing 21, readout controldata are created in the database-structure database search-conditiondecision processing 34 in such a manner that the data of the latestdatabase structure (these are data representing the structures of allsub-databases 10A contained in the grouped database 10) stored in thedatabase-structure database 12 will be read out. The readout controldata are applied to the database-structure database 12 via the databasestructure storage/readout control processing 26. As a result, the latestdata of the data stored in the database-structure database 12 are readout. The latest data read out of the database-structure database 12 areapplied to database-structure similarity calculation processing 33 viathe database structure storage/readout control processing 26.

Further, data other than that of the latest database structure stored inthe database-structure database 12 also are read out successively andapplied to the database-structure similarity calculation processing 33.

The database-structure similarity calculation processing 33 calculatesthe degree of similarity between the data of the latest databasestructure stored in the database-structure database 12 and the data ofother database structures. In other words, the database-structure degreeof similarity between the structure of a database contained in thegrouped database 10 and a database structure represented by the databasestructure data stored in the database-structure database 12 iscalculated by the database-structure similarity calculation processing33. The details of processing for calculating the database-structuredegree of similarity will be described later.

The search-condition degree of similarity calculated in thesearch-condition similarity calculation processing 32 and thedatabase-structure degree of similarity calculated in thedatabase-structure similarity calculation processing 33 are each appliedto similarity synthesizing processing 35. Mixing (which can be carriedout based upon an algebraic product or logical product) of thesearch-condition degree of similarity and the database-structure degreeof similarity is performed in the similarity synthesizing processing 35.

The synthesized degree of similarity is applied to search-resultprediction processing 36. In the databases contained in the currentgrouped database 10, the pertinent number of cases of the dataconforming to the search condition provided by the user and the timerequired for the search conforming to the search method are predicted,for each search method, in the search-result prediction processing 36.The predicted pertinent number of cases and the time required for thesearch for each search method in the search-result prediction processing36 are applied to search-method decision processing 37.

A search method for a data search is decided by the search-methoddecision processing 37 for every sub-database 10A in the groupeddatabase 10 in such a manner that the time required for a data search ofthe grouped database 10 is shortened. The data representing the searchmethod decided is applied to the search execution processing 22 and thedata search is conducted using the search method decided for everysub-database 10A contained in the grouped database 10.

Appropriate search methods decided in correspondence with thesub-databases 10A contained in the grouped database 10 are decided andthe data search is conducted using the search methods decided. As aresult, the time required for the data search is shortened.

FIG. 4 illustrates the details of database-characteristic analysisprocessing.

In database-characteristic analysis processing 25, a data exchange isperformed between a database-structure characteristic common-knowledgedatabase 45 which stores general knowledge relating to the databasestructure and a database-structure characteristic comparative-knowledgedatabase 46 which compares with the other database structure.

The following rules are stored in the database-structure characteristiccommon-knowledge database 45, by way of example:

Rule 1: If a user search condition frequently uses classifications whichare few in type and grouping is performed by such classification, then aspecific classification extracted with this database structure will befairly good.

Rule 2: If a user search condition frequently uses classifications ofmany types and grouping is performed by such classification, then aspecific classification extracted with this database structure will notbe too good.

Rule 3: If there is a threshold value of a specific-classification rangeextracted where there are many types, then the setting of thespecific-classification range extracted with this database structurewill be slightly unsatisfactory.

Rule 4: If a large amount of memory capacity is used, then a specificclassification extracted with this database structure will not be verygood.

The following rules are stored in the database-structure characteristiccomparative-knowledge database 46, by way of example:

Rule 1: If there are a plurality of database structures grouped bysimilar specific classifications and mean search time is shorter thanthat of other database structures, then these database structures willbe good.

Rule 2: If there are a plurality of database structures grouped bysimilar specific classifications and database structures grouped even byother specific classifications have a shorter mean search time, thenthese database structures will lack extracted specific classifications.

Rule 3: If there are a plurality of database structures grouped solelyby similar specific classifications and other database structures have ashorter mean search time, then these database structures will beunsatisfactory in terms of the setting of the specific-classificationrange that has been extracted.

Rule 4: If there are a plurality of database structures having similarmean search times and other database structures use less memorycapacity, then a specific classification extracted with these databasestructures will not be very good.

Among the data that have been stored in the search-result database 11,data representing the results of searching the databases contained inthe grouped database 10 are read out of the search-result database 11 bythe search-result storage/readout control processing 23. The data readout are applied to the database-characteristic analysis processing 25via the search-result storage/readout control processing 23.

The data applied to the database-characteristic analysis processing 25representing the results of searching the databases contained in thegrouped database 10 are applied to database-structure characteristicgeneral judgment processing 43 and database-structure characteristiccomparative judgment processing 44.

The rules that have been stored in the database-structure characteristiccommon-knowledge database 45 are applied to the database-structurecharacteristic general discrimination processing 43 as well. The degreeof conformity of the general characteristics of the databases in thesub-databases 10A contained in the grouped database 10 is discriminatedin the database-structure characteristic general discriminationprocessing 43 on the basis of the rules that have been stored in thedatabase-structure characteristic common-knowledge database 45. Thedegree of conformity representing the result of discrimination isapplied to summing synthesis processing 41 and 42.

The rules that have been stored in the database-structure characteristiccomparative-knowledge database 46 are applied to the database-structurecharacteristic comparative judgment processing 44 as well. The degree ofconformity of the database comparative comparison characteristics in thesub-databases 10A contained in the grouped database 10 is discriminatedin the database-structure characteristic comparative judgment processing44 on the basis of the rules that have been stored in thedatabase-structure characteristic comparative-knowledge database 46. Thedegree of conformity representing the result of discrimination also isapplied to the summing synthesis processing 41 and 42.

Processing for summing the degrees of conformity each of which have beenprovided with a prescribed weighting is performed in the summingsynthesis processing 41 and 41. The degree of conformity obtained fromthe summing synthesis processing 41 is applied toreorganization-instruction creation processing 40 as subdivided-itemdegree of conformity representing the suitability of specificclassifications of sub-databases 10A contained in the grouped database10, and the degree of conformity obtained by the summing synthesisprocessing 42 is applied to the reorganization-instruction creationprocessing 40 as a subdivided-range degree of conformity representingthe suitability of a range of classifications (e.g., annual incomegreater than 10,000,000 yens and less than 20,000,000 yen) of thesub-databases 10A contained in the grouped database 10.

Whether the sub-databases 10A contained in the grouped database 10 areto be reorganized is determined in the reorganization-instructioncreation processing 40 based upon the subdivided-item degree ofconformity and subdivided-range degree of conformity provided by thedatabase-characteristic analysis processing 25. If reorganization isnecessary, the reorganization-instruction creation processing 40 createsa reorganization instruction and updates the data contained in thegrouped database 10.

The determination regarding reorganization is performed in accordancewith the following conditions, by way of example:

Rule 1: In a case where the present subdivided-item degree of conformityand subdivided-range degree of conformity are both sufficiently high,the database structure must not be reorganized.

Rule 2: In a case where the present subdivided-item degree of conformityis sufficiently high and the subdivided-range degree of conformity islow, the specific classification range extracted in a statisticalanalytical manner (in the manner of a technique for finding a maximumvalue) is reorganized.

Rule 3: In a case where a database structure in which the presentsubdivided-item degree of conformity is low and the othersubdivided-item degrees of conformity are high exists, the specificclassification to be extracted is changed and the specificclassification range extracted in a statistical analytical manner (inthe manner of a technique for finding a maximum value) is reorganized.

Rule 4: In a case where a database structure in which the presentsubdivided-item degree of conformity is low and the othersubdivided-item degrees of conformity are high does not exist, thedatabase characteristic is analyzed and a specific classificationextracted by a statistical technique is set.

FIG. 10 illustrates the hardware configuration of the database system 3shown in FIG. 1 (the details of which are depicted in FIGS. 2 and 3).The database system is implemented by a computer system.

The database system 3 includes a central processing unit (CPU) 5 whichexecutes processing for analyzing data, processing for creating data,etc. Connected to the central processing unit 5 are an interface 4 foraccepting data provided by a terminal device via a bus and outputtingretrieved data to the terminal device, a storage device 6 which suppliesa program area in which a program executed by the central processingunit 5 is stored, a working area and a buffer area, etc., for variouscalculations, a database group (memory unit) 7 which includes variousdatabases for storing data, and an output unit (a CRT display device, aprinter, a data writing device for writing data in a magnetic disk,etc.) 3 for outputting data in a visible or machine-readable manner.

More specifically, the storage device 6 is implemented by a RAM and aROM, etc., and the memory unit is implemented by a hard disk, magnetictape, etc.

The database group 7 includes the above-mentioned grouped database 10,search-result database 11, database-structure database 12,database-structure characteristic common-knowledge database 45 anddatabase-structure characteristic comparative-knowledge database 46.

In accordance with a preset program, the central processing unit 5executes the search-condition analysis processing 21, search executionprocessing 22, search-result storage/readout control processing 23,search-result analysis processing 24, database-characteristic analysisprocessing 25, database structure storage/readout control processing 26,reorganization execution processing 27, reorganization-instructionanalysis processing 28, search-instruction creation processing 30 andreorganization-instruction creation processing 40.

FIG. 11 illustrates a processing procedure executed when a searchcondition has been given by a user. This processing is executed by thecentral processing unit 5.

The search condition from the user is applied to the search-conditionanalysis processing 21 via the interface 4 and the analyzed contentthereof is applied to the search-instruction creation processing 30(step 51). The optimum search method is decided in thesearch-instruction creation processing 30 for each of the plurality ofsub-databases 10A contained in the grouped database 10 (step 52). Thedatabases contained in the grouped database 10 are searched using thedecided search methods in the search execution processing 22 (step 53).

When searching of the sub-databases 10A contained in the groupeddatabase 10 is finished, the data representing the search condition andsearch results, etc., are applied to the search-result database 11,which is thus updated (step 54).

Next, it is determined in database-characteristic analysis processing 25whether the structure of a sub-database 10A contained in the presentgrouped database 10 is appropriate (step 55). If the structure of thepresent sub-database is appropriate, the database contained in thepresent grouped database 10 is not reorganized and thedatabase-structure database 12 is not updated either (NO at step 55).When the present database is not appropriate, the sub-database containedin the present grouped database 10 is reorganized and this isaccompanied by updating of the database-structure database 12 as well(YES at step 55; steps 56, 57).

For example, assume that the present database structure comprises asub-database which consists of a data set having the classification data"male" in common, as shown in FIG. 8a, a sub-database which consists ofa data set having the classification data "female" in common, as shownin FIG. 8b, and a sub-database which consists of a data set having theclassification data "annual income of 20,000,000 yen or more" in common,as shown in FIG. 8c.

When a search condition specifying that all classification data "male"is to be searched is applied under these circumstances, the sub-databasein which the data shown in FIG. 8a have been stored is utilized and allof the data stored in this data are retrieved and outputted by, say, asequential search. When a search condition specifying that allclassification data "annual income of 30,000,000 yen or more" is to besearched is applied, the sub-database in which the data shown in FIG. 8chave been stored is utilized and, of the data that have been stored inthis data, all of the data having the classification data "annual incomeof 30,000,000 yen or more" are retrieved and outputted by, say, asequential search.

When a search condition specifying that all classification data "annualincome of 10,000,000 yen or more" is to be searched is applied, a searchomission occurs with the sub-database in which the data shown in FIG. 8chave been stored. Accordingly, the data shown in FIGS. 8a and 8b (sincethese data are data having the classification data "male" and "female",they include all of the original data) are searched by a sequentialsearch and the pertinent data are retrieved and outputted.

Further, if the frequency of application of the search conditionspecifying that all classification data "annual income of 10,000,000 yenor more" is to be searched is high and the frequency of application ofthe search condition specifying that all classification data "annualincome of 20,000,000 yen or more" is to be searched is low, thesub-database structure is reorganized in such a manner that thesub-database in which the data shown in FIG. 8c have been stored will beconstituted by data having the classification data "annual income of10,000,000 yen or more", as illustrated in FIG. 9c.

FIG. 12 illustrates a processing procedure for deciding a search methoddecided for each sub-database 10A, contained in the grouped database 10,in dependence upon the search condition provided by the user. Thisprocessing also is executed by the central processing unit 5.

First, the database-structure degree of similarity between the structureof the present sub-database 10A contained in the grouped database 10 anda database structure that has been stored in the database-structuredatabase 12 is calculated (step 61). The details of this processing forcalculating the database-structure degree of similarity will bedescribed later. Next, the search-condition degree of similarity betweenthe search condition provided by the user and a search condition thathas been stored in the search-result database 11 is calculated (step62). The calculated database degree of similarity and search-conditiondegree of similarity are combined (step 63).

The search time associated with a sub-database 10A contained in thepresent grouped database 10 is predicted in correspondence with varioussearch methods (step 64).

A search method for which the predicted search time is short and thedegree of similarity obtained by combination is high is decided for eachdatabase contained in the grouped database 10 (step 65).

FIGS. 13 and 14 illustrate the procedure of processing for calculatingdatabase degree of similarity. This processing also is executed by thecentral processing unit 5.

The database degree of similarity is calculated with regard to thedatabase structure of the present database of grouped database 10 andall database structures of the structures of the reorganizedsub-databases 10A, which are the sub-databases 10A that were included inthe grouped database 10, stored in the database-structure database 12.

The database structure degree of similarity is calculated for eachdatabase structure which represents the structure of a sub-database 10Acontained in the grouped database 10. As shown in FIG. 7, the databasestructure includes database structures in correspondence with the numberof sub-databases contained in the grouped database 10. Stored in eachdatabase structure are extracted specific classifications (extractedclassifications) indicating classifications having in common a data setthat has been stored in the database, and an extractedspecific-classification range (extracted classification range).

The number of times coincidence is achieved between an extractedclassification or extracted classification range and a databaseextracted classification or extracted-classification range contained inthe grouped database 10 is calculated, for each database structure, withregard to all database structures of the grouped-database structures,and the grouped-database structure degree of similarity is calculated onthe basis of the number of times coincidence is achieved.

With reference to FIGS. 13 and 7, the Grouped DB No., DB Structure No.and Subdivision No. are each reset (step 71). Further, the DB structuredegree of similarity and the number of times coincidence is achievedwith regard to the extracted classification or extracted-classificationrange of the database structure data having the DB Structure No. i whosedatabase structure degree of similarity is to be calculated are eachreset (step 72).

Next, the DB No. and the Extracted Classification No. of thedatabase-structure database 12 are reset (step 73). First, it isdetermined whether coincidence has been achieved with regard to thefirst extracted classification or extracted-classification range at DBStructure No. 0 (step 74). If coincidence is achieved (YES at step 74),then the number of times coincidence has been achieved with regard tothe extracted classification of grouped DB Structure No. 0 isincremented (step 75) and the degree of similarity of this extractedclassification is calculated (step 76). The degree of similarity of thisextracted classification can be calculated by providing adegree-of-similarity table with regard to discrete extractedclassifications (e.g., sex) or based upon degree of similarity inwell-known fuzzy sets with regard to a continuousextracted-classification range.

Next, the degrees of similarity of the extracted classifications aresummed, the DB structure degree of similarity of the Grouped DBStructure No. i is calculated (step 77) and the number of extractedclassifications is incremented so as to calculate the degree ofsimilarity of the next extracted classification (step 78).

It is determined whether the number of extracted classificationscontained in the DB structure has been attained (step 79). If the numberof classifications has not been attained, processing from step 74 onwardis repeated.

If the extracted classification or extracted-range classification of thegrouped database 10 and the extracted classification or extracted-rangeclassification of the DB structure do not coincide, the ExtractedClassification No. of the databases of grouped database 10 isincremented (step 86) and it is determined whether the ExtractedClassification No. has attained the number of extracted classificationsof the grouped database 10 (step 87). If there is no coincidence withregard to the number of extracted classifications of the groupeddatabase 10, then the processing from step 74 onward is repeated. Ifcoincidence is achieved with regard to the number of extractedclassifications (YES at step 87), the processing from step 74 onward isrepeated with regard to the next database structure contained in thegrouped database 10 (steps 88, 89).

If calculation of the degree of similarity of extracted classificationsis finished with regard to all extracted classifications or the numberof extracted classifications contained in the database structure (YES atstep 79), then the degree of similarity of this condition is calculatedfor each extracted classification or extracted-classification range withregard to the next database structure and the degree of similarity ofthe database structure is calculated. If calculation of the degrees ofsimilarity of all database structures contained in the grouped databasestructure is finished (NO step 81), then the degree of similarity of thegrouped database structure is outputted (step 82) and processing forcalculating the degree of similarity of the next grouped databasestructure is performed (steps 83, 84). When the degree of similarity ofthe next grouped database structure is calculated, the degree ofsimilarity and the number of times coincidence has been achieved withregard to the extracted classification are reset (step 85).

The degree of similarity of the search condition is calculated in thesame manner as the degree of similarity of the database structure. Thedegrees of similarity between individual designated classificationscontained in an applied search condition and individual classificationsstored as search conditions in the search-result database 11 arecalculated in dependence upon coincidence or approximation (howapproximate in case of approximation). Thus, the degree of similarity ofthe search condition is calculated, this is combined with the databasedegree of similarity, as mentioned above, and the synthesized degree ofsimilarity is obtained.

What is claimed is:
 1. A database search method comprising the stepsof:searching a database, in which there is stored a data set containingan identification code and including classification data on anitem-by-item basis, by giving a search condition which designates theclassification of the data set and using a predetermined search methoddecided by the given search condition and the structure of the database;successively storing, whenever a search is conducted, the searchcondition, search method and time required for a search when a search isconducted; calculating degrees of similarity between a given searchcondition and stored search conditions when a search condition has beengiven; reading out a search method used when search time, required whena search was conducted under a search condition having a high calculateddegree of similarity, is short; and searching said database by thesearch method read out and outputting said identification code of thedata set having classification data conforming to the search condition.2. A database search method for a case where there are one or aplurality of sub-databases, created based upon an original databasecomprising a collection of data sets containing identification codes andincluding classification data on an item-by-item basis, comprising acollection of data sets having a specific classification in common,wherein the sub-databases are reorganized, comprising the stepsof:storing designated classifications contained in a search condition,search methods and times required for the searches whenever the searchcondition, which designates one or a plurality of classifications to besearched, is given; calculating degrees of similarity between designatedclassifications having a high frequency of occurrence among the storeddesignated classifications and a specific classification common to thesub-databases; in a case where there is a designated classificationamong the designated classifications having a high frequency ofoccurrence that exhibits a low degree of similarity with regard to anyspecific classification, creating a sub-database comprising a collectionof data sets having this designated classification in common; storing aspecific classification common to sub-databases whenever a sub-databaseis created; calculating designated-classification degrees of similaritybetween a designated classification contained in a given searchcondition and designated classifications that have been stored;calculating specific-classification degrees of similarity between aspecific classification of a sub-database and specific classificationsthat have been stored; conducting a search of said sub-database by asearch method used when the designated-classification degree ofsimilarity and the specific-classification degree of similarity are highand the time required when a search was conducted at such time is short,and outputting said identification number of the data set havingclassification data conforming to the search condition.
 3. A databasesearch apparatus comprising:database searching means for searching adatabase, in which there is stored a data set containing anidentification code and including classification data on an item-by-itembasis, by giving a search condition which designates the classificationof said data set and using a predetermined search method decided by thegiven search condition and the structure of said database; memory meansfor successively storing, whenever a search is conducted, the searchcondition, search method and time required for a search when a search isconducted by said database search means; similarity calculating meansfor calculating degrees of similarity between a given search conditionand search conditions, which have been stored in said memory means, whena search condition has been given; search-method readout means forreading out a search method used when search time, required when asearch was conducted under a search condition having a high degree ofsimilarity calculated by said similarity calculating means, is short;and identification-code output means for searching said database by thesearch method read out by said search-method readout means andoutputting said identification code of the database havingclassification data conforming to the search condition.
 4. A databasesearch apparatus for searching one or a plurality of sub-databases,created based upon an original database comprising a collection of datasets containing identification codes and including classification dataon an item-by-item basis, comprising a collection of data sets having aspecific classification in common, comprising:an input unit foraccepting a search condition designating one or a plurality ofclassifications to be searched; first memory means for storingdesignated classifications contained in a search condition, searchmethods and times required for the searches whenever the searchcondition is accepted by said input unit; first similarity calculatingmeans for calculating degrees of similarity between designatedclassifications having a high frequency of occurrence among thedesignated classifications stored in said first memory means and aspecific classification common to the sub-databases; determination meansfor determining whether the designated classifications having a highfrequency of occurrence include a designated classification thatexhibits a low degree of similarity with regard to any specificclassification in the degrees of similarity calculated by said firstsimilarity calculating means; sub-database creating means which, when ithas been determined by said determination means that the designatedclassifications having a high frequency of occurrence include adesignated classification exhibiting a low degree of similarity withregard to any specific classification, is for creating a sub-databasecomprising a collection of data sets having this designatedclassification in common; second memory means for storing a specificclassification common to sub-databases whenever a sub-database iscreated by said sub-database creating means; second similaritycalculating means for calculating degrees of similarity between adesignated classification contained in a search condition entered bysaid input unit and designated classifications that have been stored insaid first memory means; third similarity calculating means forcalculating degrees of similarity between a specific classification of asub-database and specific classifications that have been stored in saidsecond memory means; and identification-number output means forconducting a search of said sub-database by a search method used when adesignated-classification degree of similarity calculated by said secondsimilarity calculating means and a specific-classification degree ofsimilarity calculated by said third similarity calculating means arehigh and the time required when a search was conducted at such time isshort, and outputting said identification number of the data set havingclassification data conforming to the search condition.